Overview

Dataset statistics

Number of variables12
Number of observations891
Missing cells14
Missing cells (%)0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory77.7 KiB
Average record size in memory89.3 B

Variable types

Numeric5
Categorical7

Alerts

Name has a high cardinality: 891 distinct values High cardinality
Ticket has a high cardinality: 681 distinct values High cardinality
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
Pclass is highly correlated with FareHigh correlation
Fare is highly correlated with PclassHigh correlation
Survived is highly correlated with SexHigh correlation
Sex is highly correlated with SurvivedHigh correlation
Survived is highly correlated with SexHigh correlation
Pclass is highly correlated with Fare and 1 other fieldsHigh correlation
Sex is highly correlated with SurvivedHigh correlation
Age is highly correlated with Age_cutHigh correlation
SibSp is highly correlated with ParchHigh correlation
Parch is highly correlated with SibSpHigh correlation
Fare is highly correlated with PclassHigh correlation
Embarked is highly correlated with PclassHigh correlation
Age_cut is highly correlated with AgeHigh correlation
Age_cut has 14 (1.6%) missing values Missing
PassengerId is uniformly distributed Uniform
Name is uniformly distributed Uniform
Ticket is uniformly distributed Uniform
PassengerId has unique values Unique
Name has unique values Unique
SibSp has 608 (68.2%) zeros Zeros
Parch has 678 (76.1%) zeros Zeros
Fare has 15 (1.7%) zeros Zeros

Reproduction

Analysis started2022-04-23 13:51:40.247477
Analysis finished2022-04-23 13:51:55.644123
Duration15.4 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

PassengerId
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean446
Minimum1
Maximum891
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-04-23T22:51:55.874475image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile45.5
Q1223.5
median446
Q3668.5
95-th percentile846.5
Maximum891
Range890
Interquartile range (IQR)445

Descriptive statistics

Standard deviation257.353842
Coefficient of variation (CV)0.5770265516
Kurtosis-1.2
Mean446
Median Absolute Deviation (MAD)223
Skewness0
Sum397386
Variance66231
MonotonicityStrictly increasing
2022-04-23T22:51:56.100901image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
5991
 
0.1%
5881
 
0.1%
5891
 
0.1%
5901
 
0.1%
5911
 
0.1%
5921
 
0.1%
5931
 
0.1%
5941
 
0.1%
5951
 
0.1%
Other values (881)881
98.9%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
8911
0.1%
8901
0.1%
8891
0.1%
8881
0.1%
8871
0.1%
8861
0.1%
8851
0.1%
8841
0.1%
8831
0.1%
8821
0.1%

Survived
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
0
549 
1
342 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row1
4th row1
5th row0

Common Values

ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Length

2022-04-23T22:51:56.322275image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-23T22:51:56.580584image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0549
61.6%
1342
38.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Pclass
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
3
491 
1
216 
2
184 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row3
2nd row1
3rd row3
4th row1
5th row3

Common Values

ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Length

2022-04-23T22:51:56.695278image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-23T22:51:56.802990image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
3491
55.1%
1216
24.2%
2184
 
20.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Name
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct891
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
Braund, Mr. Owen Harris
 
1
Boulos, Mr. Hanna
 
1
Frolicher-Stehli, Mr. Maxmillian
 
1
Gilinski, Mr. Eliezer
 
1
Murdlin, Mr. Joseph
 
1
Other values (886)
886 

Length

Max length82
Median length25
Mean length26.96520763
Min length12

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique891 ?
Unique (%)100.0%

Sample

1st rowBraund, Mr. Owen Harris
2nd rowCumings, Mrs. John Bradley (Florence Briggs Thayer)
3rd rowHeikkinen, Miss. Laina
4th rowFutrelle, Mrs. Jacques Heath (Lily May Peel)
5th rowAllen, Mr. William Henry

Common Values

ValueCountFrequency (%)
Braund, Mr. Owen Harris1
 
0.1%
Boulos, Mr. Hanna1
 
0.1%
Frolicher-Stehli, Mr. Maxmillian1
 
0.1%
Gilinski, Mr. Eliezer1
 
0.1%
Murdlin, Mr. Joseph1
 
0.1%
Rintamaki, Mr. Matti1
 
0.1%
Stephenson, Mrs. Walter Bertram (Martha Eustis)1
 
0.1%
Elsbury, Mr. William James1
 
0.1%
Bourke, Miss. Mary1
 
0.1%
Chapman, Mr. John Henry1
 
0.1%
Other values (881)881
98.9%

Length

2022-04-23T22:51:56.985505image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
mr521
 
14.4%
miss182
 
5.0%
mrs129
 
3.6%
william64
 
1.8%
john44
 
1.2%
master40
 
1.1%
henry35
 
1.0%
george24
 
0.7%
james24
 
0.7%
charles23
 
0.6%
Other values (1515)2538
70.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Sex
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
male
577 
female
314 

Length

Max length6
Median length4
Mean length4.704826038
Min length4

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowfemale
4th rowfemale
5th rowmale

Common Values

ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Length

2022-04-23T22:51:57.207943image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-23T22:51:57.342586image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
male577
64.8%
female314
35.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct89
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29.69911765
Minimum0.42
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-04-23T22:51:57.488160image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0.42
5-th percentile6
Q122
median29.69911765
Q335
95-th percentile54
Maximum80
Range79.58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation13.00201523
Coefficient of variation (CV)0.4377912967
Kurtosis0.9662793027
Mean29.69911765
Median Absolute Deviation (MAD)6.300882353
Skewness0.434488094
Sum26461.91382
Variance169.0523999
MonotonicityNot monotonic
2022-04-23T22:51:57.738527image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
29.69911765177
 
19.9%
2430
 
3.4%
2227
 
3.0%
1826
 
2.9%
2825
 
2.8%
3025
 
2.8%
1925
 
2.8%
2124
 
2.7%
2523
 
2.6%
3622
 
2.5%
Other values (79)487
54.7%
ValueCountFrequency (%)
0.421
 
0.1%
0.671
 
0.1%
0.752
 
0.2%
0.832
 
0.2%
0.921
 
0.1%
17
0.8%
210
1.1%
36
0.7%
410
1.1%
54
 
0.4%
ValueCountFrequency (%)
801
 
0.1%
741
 
0.1%
712
0.2%
70.51
 
0.1%
702
0.2%
661
 
0.1%
653
0.3%
642
0.2%
632
0.2%
624
0.4%

SibSp
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5230078563
Minimum0
Maximum8
Zeros608
Zeros (%)68.2%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-04-23T22:51:57.937957image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum8
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.102743432
Coefficient of variation (CV)2.108464374
Kurtosis17.88041973
Mean0.5230078563
Median Absolute Deviation (MAD)0
Skewness3.695351727
Sum466
Variance1.216043077
MonotonicityNot monotonic
2022-04-23T22:51:58.097530image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
418
 
2.0%
316
 
1.8%
87
 
0.8%
55
 
0.6%
ValueCountFrequency (%)
0608
68.2%
1209
 
23.5%
228
 
3.1%
316
 
1.8%
418
 
2.0%
55
 
0.6%
87
 
0.8%
ValueCountFrequency (%)
87
 
0.8%
55
 
0.6%
418
 
2.0%
316
 
1.8%
228
 
3.1%
1209
 
23.5%
0608
68.2%

Parch
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3815937149
Minimum0
Maximum6
Zeros678
Zeros (%)76.1%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-04-23T22:51:58.258103image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum6
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8060572211
Coefficient of variation (CV)2.112344071
Kurtosis9.778125179
Mean0.3815937149
Median Absolute Deviation (MAD)0
Skewness2.749117047
Sum340
Variance0.6497282437
MonotonicityNot monotonic
2022-04-23T22:51:58.420667image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
55
 
0.6%
35
 
0.6%
44
 
0.4%
61
 
0.1%
ValueCountFrequency (%)
0678
76.1%
1118
 
13.2%
280
 
9.0%
35
 
0.6%
44
 
0.4%
55
 
0.6%
61
 
0.1%
ValueCountFrequency (%)
61
 
0.1%
55
 
0.6%
44
 
0.4%
35
 
0.6%
280
 
9.0%
1118
 
13.2%
0678
76.1%

Ticket
Categorical

HIGH CARDINALITY
UNIFORM

Distinct681
Distinct (%)76.4%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
347082
 
7
CA. 2343
 
7
1601
 
7
3101295
 
6
CA 2144
 
6
Other values (676)
858 

Length

Max length18
Median length6
Mean length6.750841751
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique547 ?
Unique (%)61.4%

Sample

1st rowA/5 21171
2nd rowPC 17599
3rd rowSTON/O2. 3101282
4th row113803
5th row373450

Common Values

ValueCountFrequency (%)
3470827
 
0.8%
CA. 23437
 
0.8%
16017
 
0.8%
31012956
 
0.7%
CA 21446
 
0.7%
3470886
 
0.7%
S.O.C. 148795
 
0.6%
3826525
 
0.6%
LINE4
 
0.4%
PC 177574
 
0.4%
Other values (671)834
93.6%

Length

2022-04-23T22:51:58.653045image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
pc60
 
5.3%
c.a27
 
2.4%
a/517
 
1.5%
ca14
 
1.2%
ston/o12
 
1.1%
212
 
1.1%
sc/paris9
 
0.8%
w./c9
 
0.8%
soton/o.q8
 
0.7%
3470827
 
0.6%
Other values (709)955
84.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Fare
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct248
Distinct (%)27.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.20420797
Minimum0
Maximum512.3292
Zeros15
Zeros (%)1.7%
Negative0
Negative (%)0.0%
Memory size7.1 KiB
2022-04-23T22:51:58.865479image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile7.225
Q17.9104
median14.4542
Q331
95-th percentile112.07915
Maximum512.3292
Range512.3292
Interquartile range (IQR)23.0896

Descriptive statistics

Standard deviation49.6934286
Coefficient of variation (CV)1.543072528
Kurtosis33.39814088
Mean32.20420797
Median Absolute Deviation (MAD)6.9042
Skewness4.78731652
Sum28693.9493
Variance2469.436846
MonotonicityNot monotonic
2022-04-23T22:51:59.098857image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.0543
 
4.8%
1342
 
4.7%
7.895838
 
4.3%
7.7534
 
3.8%
2631
 
3.5%
10.524
 
2.7%
7.92518
 
2.0%
7.77516
 
1.8%
7.229215
 
1.7%
015
 
1.7%
Other values (238)615
69.0%
ValueCountFrequency (%)
015
1.7%
4.01251
 
0.1%
51
 
0.1%
6.23751
 
0.1%
6.43751
 
0.1%
6.451
 
0.1%
6.49582
 
0.2%
6.752
 
0.2%
6.85831
 
0.1%
6.951
 
0.1%
ValueCountFrequency (%)
512.32923
0.3%
2634
0.4%
262.3752
0.2%
247.52082
0.2%
227.5254
0.4%
221.77921
 
0.1%
211.51
 
0.1%
211.33753
0.3%
164.86672
0.2%
153.46253
0.3%

Embarked
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.1 KiB
S
646 
C
168 
Q
77 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowS
2nd rowC
3rd rowS
4th rowS
5th rowS

Common Values

ValueCountFrequency (%)
S646
72.5%
C168
 
18.9%
Q77
 
8.6%

Length

2022-04-23T22:51:59.301313image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-23T22:51:59.416008image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
s646
72.5%
c168
 
18.9%
q77
 
8.6%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Age_cut
Categorical

HIGH CORRELATION
MISSING

Distinct3
Distinct (%)0.3%
Missing14
Missing (%)1.6%
Memory size1.1 KiB
성년
690 
미성년
165 
노년
 
22

Length

Max length3
Median length2
Mean length2.188141391
Min length2

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row성년
2nd row성년
3rd row성년
4th row성년
5th row성년

Common Values

ValueCountFrequency (%)
성년690
77.4%
미성년165
 
18.5%
노년22
 
2.5%
(Missing)14
 
1.6%

Length

2022-04-23T22:51:59.547657image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-04-23T22:51:59.669333image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
성년690
78.7%
미성년165
 
18.8%
노년22
 
2.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2022-04-23T22:51:53.575654image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:46.532915image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:49.851861image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:51.208946image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:52.472565image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:53.786053image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:46.856830image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:50.072236image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:51.445310image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:52.679012image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:54.017437image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:47.340539image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:50.454252image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:51.692647image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:52.908397image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:54.258790image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:49.376096image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:50.698563image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:51.939025image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:53.147760image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:54.474214image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:49.635443image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:50.961602image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:52.207282image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2022-04-23T22:51:53.355208image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2022-04-23T22:51:59.799206image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-04-23T22:52:00.094362image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-04-23T22:52:00.356662image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-04-23T22:52:00.763576image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-04-23T22:52:00.999945image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-04-23T22:51:54.866166image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2022-04-23T22:51:55.257122image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-04-23T22:51:55.458586image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareEmbarkedAge_cut
0103Braund, Mr. Owen Harrismale22.00000010A/5 211717.2500S성년
1211Cumings, Mrs. John Bradley (Florence Briggs Thayer)female38.00000010PC 1759971.2833C성년
2313Heikkinen, Miss. Lainafemale26.00000000STON/O2. 31012827.9250S성년
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.0000001011380353.1000S성년
4503Allen, Mr. William Henrymale35.000000003734508.0500S성년
5603Moran, Mr. Jamesmale29.699118003308778.4583Q성년
6701McCarthy, Mr. Timothy Jmale54.000000001746351.8625S성년
7803Palsson, Master. Gosta Leonardmale2.0000003134990921.0750S미성년
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.0000000234774211.1333S성년
91012Nasser, Mrs. Nicholas (Adele Achem)female14.0000001023773630.0708C미성년

Last rows

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareEmbarkedAge_cut
88188203Markun, Mr. Johannmale33.000000003492577.8958S성년
88288303Dahlberg, Miss. Gerda Ulrikafemale22.00000000755210.5167S성년
88388402Banfield, Mr. Frederick Jamesmale28.00000000C.A./SOTON 3406810.5000S성년
88488503Sutehall, Mr. Henry Jrmale25.00000000SOTON/OQ 3920767.0500S성년
88588603Rice, Mrs. William (Margaret Norton)female39.0000000538265229.1250Q성년
88688702Montvila, Rev. Juozasmale27.0000000021153613.0000S성년
88788811Graham, Miss. Margaret Edithfemale19.0000000011205330.0000S미성년
88888903Johnston, Miss. Catherine Helen "Carrie"female29.69911812W./C. 660723.4500S성년
88989011Behr, Mr. Karl Howellmale26.0000000011136930.0000C성년
89089103Dooley, Mr. Patrickmale32.000000003703767.7500Q성년